Voter Segmentation in R
This document details all the code and modelling work for this project. The idea is to generate clusters from group of voters using an unsupervised machine learning model. Then, once we have generated those clusters, to analyse those clusters and tease out any insights that can be drawn from them.
Importing the data
The first thing we will need is to import the data and prepare it for analysis. This data is the latest panel of the British Election Study’s longitudinal panel survey. I’ve displayed a section of that data below.
## # A tibble: 6 x 168
## turnout_uk_general like_johnson like_starmer like_con like_lab like_ld
## <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lb> <dbl+lb> <dbl+l>
## 1 5 [Very likely that I wou~ 7 [7] 6 [6] 8 [8] 3 [3] 6 [6]
## 2 5 [Very likely that I wou~ 0 [Strongly~ 6 [6] 0 [Stro~ 6 [6] 0 [Str~
## 3 5 [Very likely that I wou~ 8 [8] 3 [3] 8 [8] 3 [3] 0 [Str~
## 4 4 [Fairly likely] 7 [7] 7 [7] 7 [7] 0 [Stro~ 0 [Str~
## 5 5 [Very likely that I wou~ 0 [Strongly~ 0 [Strongly~ 0 [Stro~ 2 [2] 2 [2]
## 6 5 [Very likely that I wou~ 0 [Strongly~ 4 [4] 2 [2] 5 [5] 5 [5]
## # ... with 162 more variables: like_brexit_party <dbl+lbl>,
## # britishness <dbl+lbl>, europeanness <dbl+lbl>, lr_scale <dbl+lbl>,
## # age <dbl+lbl>, p_gross_personal <dbl+lbl>, p_gross_household <dbl+lbl>,
## # p_eurefturnout <dbl+lbl>, general_election_vote_i_would_did_not_vote <int>,
## # general_election_vote_conservative <int>,
## # general_election_vote_labour <int>,
## # general_election_vote_liberal_democrat <int>, ...
Exploratory analysis
With this data, we can explore some of its features and visualise those findings. For this we’ll look into the left/right alignment of voters by party voting intention.
scales <- list(
list(args = list(
"marker.color", "blue"
),
label = "Conservative",
method = "restyle",
value = "Conservative"
),
list(args = list(
"marker.color", "red"
),
label = "Labour",
method = "restyle",
value = "Labour"
),
list(args = list(
"marker.color", "orange"
),
label = "Liberal",
method = "restyle",
value = "Liberal Democrats"
)
)
lr_vote <- data %>%
filter(
(
general_election_vote_conservative == 1 |
general_election_vote_labour == 1 |
general_election_vote_liberal_democrat == 1
)
) %>%
mutate(party = case_when(
(general_election_vote_conservative == 1) ~ "Conservative",
(general_election_vote_labour == 1) ~ "Labour",
(general_election_vote_liberal_democrat == 1) ~ "Liberal Democrats"
)
) %>%
count(party, lr_scale) %>%
group_by(party) %>%
mutate(prop = prop.table(n)) %>%
mutate(class = case_when(
party == "Conservative" ~ 1,
party == "Labour" ~ 2,
party == "Liberal Democrats" ~ 3
)
) %>%
ungroup()
party_plot <- ggplot(lr_vote, aes(x = lr_scale, y = prop, frame = party, fill = party)) +
geom_col(position = "identity") +
theme_minimal() +
scale_fill_manual(
values = c(
"Conservative" = "blue",
"Labour" = "red",
"Liberal Democrats" = "orange"
)
) +
labs(
x = "Left/Right Scale (0 = Left, 10 = Right)",
y = "Proportion",
title = "Left/Right Alignment of 2019 Voters by Party",
subtitle = "BES respondents by left/right score and 2019 party vote (non-voters not included)"
) +
scale_y_continuous(limits = c(0,0.152), expand = c(0,0), labels = scales::percent_format(accuracy = 1)) +
theme(
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
axis.line.x.bottom = element_line(color = "black"),
legend.position = "none"
)
ggplotly(party_plot)htmlwidgets::saveWidget(ggplotly(party_plot),
file = "html/party_plot.html",
selfcontained = TRUE)Elbow plots
For the K-means clustering algorithm, we will have to decide upon a value of k. In other words, how many clusters we would like to generate. For this, we will be using an elbow plot to decide on the value of K. This plot will visualise the within sum of squares to identify the optimal value of K
#Fit models
tot_withinss <- map_dbl(1:10, function(k){
model <- kmeans(x = data, centers = k, iter.max = 100, nstart = 1000)
model$tot.withinss
})
#Create data frame of model results
elbow_df <- data.frame(
k = 1:10,
tot_withinss = tot_withinss
)
#Create eblow plot
elbow_plot <- ggplot(elbow_df, aes(x = k, y = tot_withinss)) +
geom_line() +
geom_point(aes(color = as.character(k)), size = 4) +
geom_vline(xintercept = 3, linetype = "dashed", color = "black") +
annotate("label", x = 3, y = 7.5e+11, label = "Optimal number of clusters, K = 3", size = 4) +
scale_x_continuous(n.breaks = 10) +
labs(title = "Elbow Plot", subtitle = "Sum of squares for each value of K", y = "Within sum of squares", x = "Value of K") +
theme_classic() +
theme(
axis.line.x.bottom = element_line(color = "black")
, axis.line.y.left = element_line(color = "black")
, legend.position = "none"
, panel.grid.minor.x = element_blank()
)
elbow_plotggsave("Images/elbow_plot.png", elbow_plot, height = 4.47, width = 7.2)Cluster modelling
Next we’ll fit the k-means model to the data. This will take some time to fit, given the size of the data. However, it indicates that 3 clusters is the optimal number of clusters given the data we have.
#fit cluster model
kmean <- kmeans(data, centers = 3, iter.max = 1000, nstart = 10000)
head(kmean$centers)## turnout_uk_general like_johnson like_starmer like_con like_lab
## 1 71.23954 29.14978 138.4821 8.457323 15.60794
## 2 170.51993 89.48788 517.2494 11.354911 13.54512
## 3 663.85590 1549.27083 3248.9062 3474.961806 3978.02431
## like_ld like_brexit_party britishness europeanness lr_scale age
## 1 94.039190 2.165227 99.94279 113.2693 3.184449 57.75714
## 2 3.975128 9999.000000 91.62532 204.1239 3.410953 56.58610
## 3 9877.529514 9495.645833 526.10764 801.4132 3.249132 51.85938
## p_gross_personal p_gross_household p_eurefturnout
## 1 7.787612 9.628031 50.79425
## 2 7.740115 9.846620 88.65099
## 3 7.671875 9.470486 278.69792
## general_election_vote_i_would_did_not_vote general_election_vote_conservative
## 1 0.009133178 0.4003653
## 2 0.008131378 0.4701849
## 3 0.027777778 0.3368056
## general_election_vote_labour general_election_vote_liberal_democrat
## 1 0.2500830 0.06036201
## 2 0.2067921 0.05580357
## 3 0.1753472 0.01041667
## general_election_vote_scottish_national_party_snp
## 1 0.04873796
## 2 0.03380102
## 3 0.02083333
## general_election_vote_plaid_cymru
## 1 0.008468947
## 2 0.002551020
## 3 0.005208333
## general_election_vote_united_kingdom_independence_party_ukip
## 1 0
## 2 0
## 3 0
## general_election_vote_green_party
## 1 0.06957821
## 2 0.04910714
## 3 0.04513889
## general_election_vote_british_national_party_bnp general_election_vote_other
## 1 0 0.015360345
## 2 0 0.007493622
## 3 0 0.012152778
## general_election_vote_change_uk_the_independent_group
## 1 0
## 2 0
## 3 0
## general_election_vote_brexit_party_reform_uk
## 1 0.0282298240
## 2 0.0003188776
## 3 0.0052083333
## general_election_vote_an_independent_candidate
## 1 0
## 2 0
## 3 0
## general_election_vote_dont_know risk_poverty_very_unlikely
## 1 0.1096812 0.3935570
## 2 0.1658163 0.3743622
## 3 0.3611111 0.2569444
## risk_poverty_fairly_unlikely risk_poverty_neither_likely_nor_unlikely
## 1 0.3031385 0.1471272
## 2 0.3136161 0.1506696
## 3 0.3194444 0.1909722
## risk_poverty_fairly_likely risk_poverty_very_likely risk_poverty_dont_know
## 1 0.08211558 0.04508469 0.02897708
## 2 0.08083546 0.04161352 0.03890306
## 3 0.07986111 0.08159722 0.07118056
## risk_unemployment_very_unlikely risk_unemployment_fairly_unlikely
## 1 0.5016606 0.1943706
## 2 0.4748087 0.2179528
## 3 0.3784722 0.2031250
## risk_unemployment_neither_likely_nor_unlikely risk_unemployment_fairly_likely
## 1 0.1495350 0.04873796
## 2 0.1419005 0.04623724
## 3 0.1840278 0.05208333
## risk_unemployment_very_likely risk_unemployment_dont_know country_england
## 1 0.04375623 0.06193955 0.8277150
## 2 0.04129464 0.07780612 0.8721301
## 3 0.04861111 0.13368056 0.8784722
## country_scotland country_wales p_education_no_formal_qualifications
## 1 0.10619396 0.06609100 0.04956825
## 2 0.09279337 0.03507653 0.05628189
## 3 0.06250000 0.05902778 0.07986111
## p_education_youth_training_certificate_skillseekers
## 1 0.001162405
## 2 0.001434949
## 3 0.010416667
## p_education_recognised_trade_apprenticeship_completed
## 1 0.011125872
## 2 0.011001276
## 3 0.006944444
## p_education_clerical_and_commercial p_education_city_guilds_certificate
## 1 0.01461309 0.03503819
## 2 0.01881378 0.03746811
## 3 0.02430556 0.04861111
## p_education_city_guilds_certificate_advanced p_education_onc
## 1 0.02623713 0.010129525
## 2 0.01977041 0.009566327
## 3 0.02951389 0.003472222
## p_education_cse_grades_2_5
## 1 0.01668881
## 2 0.02088648
## 3 0.02604167
## p_education_cse_grade_1_gce_o_level_gcse_school_certificate
## 1 0.1223016
## 2 0.1419005
## 3 0.1371528
## p_education_scottish_ordinary_lower_certificate
## 1 0.005147791
## 2 0.006855867
## 3 0.000000000
## p_education_gce_a_level_or_higher_certificate
## 1 0.1080206
## 2 0.1250000
## 3 0.1250000
## p_education_scottish_higher_certificate
## 1 0.015194288
## 2 0.013552296
## 3 0.008680556
## p_education_nursing_qualification_e_g_sen_srn_scm_rgn
## 1 0.01560943
## 2 0.01785714
## 3 0.03645833
## p_education_teaching_qualification_not_degree p_education_university_diploma
## 1 0.02382929 0.03852541
## 2 0.02933673 0.03125000
## 3 0.02777778 0.03472222
## p_education_university_or_cnaa_first_degree_e_g_ba_b_sc_b_ed
## 1 0.2407008
## 2 0.2300702
## 3 0.1649306
## p_education_university_or_cnaa_higher_degree_e_g_m_sc_ph_d
## 1 0.12147127
## 2 0.09311224
## 3 0.05208333
## p_education_other_technical_professional_or_higher_qualification
## 1 0.1268682
## 2 0.1184630
## 3 0.1423611
## p_education_dont_know p_education_prefer_not_to_say p_ethnicity_white_british
## 1 0.006061109 0.01170707 0.9200432
## 2 0.006058673 0.01132015 0.9339923
## 3 0.017361111 0.02430556 0.8923611
## p_ethnicity_any_other_white_background p_ethnicity_white_and_black_caribbean
## 1 0.02632016 0.001660578
## 2 0.02072704 0.001275510
## 3 0.02430556 0.000000000
## p_ethnicity_white_and_black_african p_ethnicity_white_and_asian
## 1 0.0008302889 0.004068416
## 2 0.0004783163 0.002551020
## 3 0.0000000000 0.005208333
## p_ethnicity_any_other_mixed_background p_ethnicity_indian
## 1 0.004068416 0.007887745
## 2 0.003029337 0.009088010
## 3 0.006944444 0.019097222
## p_ethnicity_pakistani p_ethnicity_bangladeshi
## 1 0.004068416 0.001411491
## 2 0.003029337 0.002072704
## 3 0.005208333 0.006944444
## p_ethnicity_any_other_asian_background p_ethnicity_black_caribbean
## 1 0.002906011 0.003155098
## 2 0.001753827 0.003029337
## 3 0.000000000 0.006944444
## p_ethnicity_black_african p_ethnicity_any_other_black_background
## 1 0.003487214 0.0004981734
## 2 0.002551020 0.0006377551
## 3 0.008680556 0.0017361111
## p_ethnicity_chinese p_ethnicity_other_ethnic_group
## 1 0.003238127 0.005895051
## 2 0.004464286 0.003826531
## 3 0.012152778 0.000000000
## p_ethnicity_prefer_not_to_say p_marital_married
## 1 0.010461641 0.5539688
## 2 0.007493622 0.5455995
## 3 0.010416667 0.4704861
## p_marital_in_a_civil_partnership
## 1 0.008136832
## 2 0.006855867
## 3 0.003472222
## p_marital_separated_but_still_legally_married_or_in_a_civil_partnership
## 1 0.01519429
## 2 0.01690051
## 3 0.02256944
## p_marital_living_with_a_partner_but_neither_married_nor_in_a_civil_partnership
## 1 0.1104284
## 2 0.1128827
## 3 0.1440972
## p_marital_in_a_relationship_but_not_living_together p_marital_single
## 1 0.03155098 0.1565095
## 2 0.03140944 0.1452487
## 3 0.06250000 0.1649306
## p_marital_divorced p_marital_widowed p_paper_read_the_express
## 1 0.07663567 0.04757556 0.017685154
## 2 0.08179209 0.05931122 0.015943878
## 3 0.10416667 0.02777778 0.008680556
## p_paper_read_the_daily_mail_the_scottish_daily_mail
## 1 0.1337595
## 2 0.1492347
## 3 0.1024306
## p_paper_read_the_mirror_daily_record
## 1 0.02499170
## 2 0.02343750
## 3 0.02604167
## p_paper_read_the_daily_star_the_daily_star_of_scotland p_paper_read_the_sun
## 1 0.002656925 0.04383926
## 2 0.002710459 0.04799107
## 3 0.003472222 0.06944444
## p_paper_read_the_daily_telegraph p_paper_read_the_financial_times
## 1 0.05322152 0.005479907
## 2 0.03842474 0.003826531
## 3 0.01562500 0.006944444
## p_paper_read_the_guardian p_paper_read_the_independent p_paper_read_the_times
## 1 0.12487546 0.021172368 0.05679176
## 2 0.07190689 0.012914541 0.04257015
## 3 0.03819444 0.005208333 0.02604167
## p_paper_read_the_scotsman p_paper_read_the_herald_glasgow
## 1 0.002739954 0.0043175025
## 2 0.001434949 0.0009566327
## 3 0.000000000 0.0000000000
## p_paper_read_the_western_mail
## 1 0.0014114912
## 2 0.0001594388
## 3 0.0000000000
## p_paper_read_other_local_daily_morning_newspaper p_paper_read_other_newspaper
## 1 0.02457655 0.04275988
## 2 0.02040816 0.03651148
## 3 0.02083333 0.04166667
## p_paper_read_none
## 1 0.4397210
## 2 0.5315689
## 3 0.6354167
## p_religion_no_i_do_not_regard_myself_as_belonging_to_any_particular_religion
## 1 0.5068084
## 2 0.4789541
## 3 0.5000000
## p_religion_yes_church_of_england_anglican_episcopal
## 1 0.2759050
## 2 0.3010204
## 3 0.2569444
## p_religion_yes_roman_catholic p_religion_yes_presbyterian_church_of_scotland
## 1 0.07065759 0.02980737
## 2 0.06951531 0.02790179
## 3 0.07465278 0.01388889
## p_religion_yes_methodist p_religion_yes_baptist
## 1 0.02208569 0.011374958
## 2 0.02216199 0.010204082
## 3 0.02083333 0.006944444
## p_religion_yes_united_reformed_church p_religion_yes_free_presbyterian
## 1 0.004068416 0.0003321156
## 2 0.003507653 0.0003188776
## 3 0.001736111 0.0017361111
## p_religion_yes_brethren p_religion_yes_judaism p_religion_yes_hinduism
## 1 0.0004981734 0.008053803 0.003653271
## 2 0.0004783163 0.008928571 0.004783163
## 3 0.0000000000 0.008680556 0.008680556
## p_religion_yes_islam p_religion_yes_sikhism p_religion_yes_buddhism
## 1 0.007223514 0.001411491 0.003902358
## 2 0.006696429 0.002232143 0.003985969
## 3 0.010416667 0.006944444 0.006944444
## p_religion_yes_other p_religion_prefer_not_to_say
## 1 0.02084025 0.01477914
## 2 0.02232143 0.01721939
## 3 0.02604167 0.02256944
## p_religion_yes_a_orthodox_christian
## 1 0.004068416
## 2 0.002232143
## 3 0.005208333
## p_religion_yes_pentecostal_e_g_assemblies_of_god_elim_pentecostal_church_new_testament_church_of_god_redeemed_christian_church_of_god
## 1 0.004732647
## 2 0.004783163
## 3 0.013888889
## p_religion_yes_evangelical_a_independent_non_denominational_e_g_fiec_pioneer_vineyard_newfrontiers
## 1 0.009797409
## 2 0.012755102
## 3 0.013888889
## p_past_vote_2015_i_would_did_not_vote p_past_vote_2015_conservative
## 1 0 0.3542843
## 2 0 0.4158163
## 3 0 0.3385417
## p_past_vote_2015_labour p_past_vote_2015_liberal_democrat
## 1 0.3089505 0.09025241
## 2 0.2809311 0.09008291
## 3 0.3298611 0.04687500
## p_past_vote_2015_scottish_national_party_snp p_past_vote_2015_plaid_cymru
## 1 0.04832282 0.007721687
## 2 0.03316327 0.002232143
## 3 0.02777778 0.006944444
## p_past_vote_2015_united_kingdom_independence_party_ukip
## 1 0.11084357
## 2 0.09646046
## 3 0.08854167
## p_past_vote_2015_green_party p_past_vote_2015_british_national_party_bnp
## 1 0.05214215 0.0004151445
## 2 0.04799107 0.0003188776
## 3 0.05555556 0.0000000000
## p_past_vote_2015_other p_past_vote_2015_change_uk_the_independent_group
## 1 0.01461309 0
## 2 0.01275510 0
## 3 0.02083333 0
## p_past_vote_2015_brexit_party_reform_uk
## 1 0
## 2 0
## 3 0
## p_past_vote_2015_an_independent_candidate p_past_vote_2015_dont_know
## 1 0 0.01245433
## 2 0 0.02024872
## 3 0 0.08506944
## p_past_vote_2017_i_would_did_not_vote p_past_vote_2017_conservative
## 1 0 0.4210395
## 2 0 0.4701849
## 3 0 0.3732639
## p_past_vote_2017_labour p_past_vote_2017_liberal_democrat
## 1 0.3633344 0.09299236
## 2 0.3470982 0.08115434
## 3 0.4236111 0.02951389
## p_past_vote_2017_scottish_national_party_snp p_past_vote_2017_plaid_cymru
## 1 0.04392228 0.007306543
## 2 0.02917730 0.003188776
## 3 0.02083333 0.008680556
## p_past_vote_2017_united_kingdom_independence_party_ukip
## 1 0.02889406
## 2 0.02184311
## 3 0.02256944
## p_past_vote_2017_green_party p_past_vote_2017_british_national_party_bnp
## 1 0.02092328 0
## 2 0.01785714 0
## 3 0.03472222 0
## p_past_vote_2017_other p_past_vote_2017_change_uk_the_independent_group
## 1 0.01054467 0
## 2 0.01052296 0
## 3 0.01215278 0
## p_past_vote_2017_brexit_party_reform_uk
## 1 0
## 2 0
## 3 0
## p_past_vote_2017_an_independent_candidate p_past_vote_2017_dont_know
## 1 0 0.01104284
## 2 0 0.01897321
## 3 0 0.07465278
## p_past_vote_2019_i_would_did_not_vote p_past_vote_2019_conservative
## 1 0 0.4336599
## 2 0 0.4904337
## 3 0 0.4045139
## p_past_vote_2019_labour p_past_vote_2019_liberal_democrat
## 1 0.3085354 0.12363002
## 2 0.2812500 0.11814413
## 3 0.3350694 0.06597222
## p_past_vote_2019_scottish_national_party_snp p_past_vote_2019_plaid_cymru
## 1 0.05139489 0.008302889
## 2 0.03507653 0.003348214
## 3 0.02777778 0.012152778
## p_past_vote_2019_united_kingdom_independence_party_ukip
## 1 0.0006642312
## 2 0.0001594388
## 3 0.0017361111
## p_past_vote_2019_green_party p_past_vote_2019_british_national_party_bnp
## 1 0.02989040 0
## 2 0.02551020 0
## 3 0.04861111 0
## p_past_vote_2019_other p_past_vote_2019_change_uk_the_independent_group
## 1 0.01095981 0
## 2 0.01227679 0
## 3 0.01909722 0
## p_past_vote_2019_brexit_party_reform_uk
## 1 0.02208569
## 2 0.01881378
## 3 0.01736111
## p_past_vote_2019_an_independent_candidate p_past_vote_2019_dont_know
## 1 0.004317502 0.006559283
## 2 0.004942602 0.010044643
## 3 0.010416667 0.057291667
## eu_ref_vote_stay_remain_in_the_eu eu_ref_vote_leave_the_eu
## 1 0.4463633 0.4849718
## 2 0.3609694 0.5360332
## 3 0.3281250 0.4652778
## eu_ref_vote_i_would_will_not_vote eu_ref_vote_dont_know
## 1 0.01386583 0.05479907
## 2 0.01674107 0.08625638
## 3 0.03819444 0.16840278
PCA
To visualise the clusters we have identified, we’ll compute a PCA for our data so that we can visualise our clusters. We’ll then extract the coordinate data generated by the PCA and use that to visualise our clusters.
#Create PCA
pca <- prcomp(data)
#Retrieve coordinates of individuals
ind.coord <- as_tibble(get_pca_ind(pca)$coord) %>%
mutate(cluster = factor(kmean$cluster))
head(ind.coord)## # A tibble: 6 x 169
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9 Dim.10 Dim.11
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -3649. -88.9 -47.0 -122. 32.2 75.2 36.4 -5.30 -25.3 -8.44 11.0
## 2 6262. -1371. -71.2 -108. 110. 200. 61.1 45.1 12.1 8.54 6.88
## 3 -3654. -93.2 -45.8 -117. 35.3 76.7 36.3 -4.70 -26.0 -8.20 25.2
## 4 -3648. -93.8 -42.2 -121. 32.3 76.1 36.3 -4.47 -25.0 -9.65 17.0
## 5 -3654. -98.4 -49.1 -120. 29.3 72.8 37.2 -12.3 -25.7 -3.50 -9.15
## 6 -3654. -92.4 -48.3 -119. 29.4 73.3 37.8 -9.85 -21.4 -3.23 -11.0
## # ... with 158 more variables: Dim.12 <dbl>, Dim.13 <dbl>, Dim.14 <dbl>,
## # Dim.15 <dbl>, Dim.16 <dbl>, Dim.17 <dbl>, Dim.18 <dbl>, Dim.19 <dbl>,
## # Dim.20 <dbl>, Dim.21 <dbl>, Dim.22 <dbl>, Dim.23 <dbl>, Dim.24 <dbl>,
## # Dim.25 <dbl>, Dim.26 <dbl>, Dim.27 <dbl>, Dim.28 <dbl>, Dim.29 <dbl>,
## # Dim.30 <dbl>, Dim.31 <dbl>, Dim.32 <dbl>, Dim.33 <dbl>, Dim.34 <dbl>,
## # Dim.35 <dbl>, Dim.36 <dbl>, Dim.37 <dbl>, Dim.38 <dbl>, Dim.39 <dbl>,
## # Dim.40 <dbl>, Dim.41 <dbl>, Dim.42 <dbl>, Dim.43 <dbl>, Dim.44 <dbl>, ...
#Generate the percentages of the variance explained by the dimensions
eigenvalue <- as_tibble(round(get_eigenvalue(pca), 1))
variance.percent <- eigenvalue$variance.percent
#Show first 6 rows
head(eigenvalue)## # A tibble: 6 x 3
## eigenvalue variance.percent cumulative.variance.percent
## <dbl> <dbl> <dbl>
## 1 23354263. 62 62
## 2 5223253. 13.9 75.8
## 3 2715465 7.2 83
## 4 1955960. 5.2 88.2
## 5 1231119. 3.3 91.5
## 6 1108664. 2.9 94.4
PCA visualisation
With the PCA coordinate data we can also generate the 2D cluster visualisation plot to visualise our clusters.
#Create PCA viz
pca_viz <- ggpubr::ggscatter(
ind.coord, x = "Dim.1", y = "Dim.2",
color = "cluster", palette = "npg", ellipse = TRUE, ellipse.type = "convex",
size = 1.5, ggtheme = theme_minimal(),
xlab = paste0("Comp 1 (", variance.percent[1], "% )" ),
ylab = paste0("Comp 2 (", variance.percent[2], "% )" )
) +
ggpubr::stat_mean(aes(color = cluster), size = 4) +
labs(
title = "Voter Clusters by Component"
, subtitle = "Voter clusters by 2 largest PCA components"
, color = "Cluster"
, fill = "Cluster"
) +
scale_color_manual(values = c(
"1" = "#FF8000"
, "2" = "#00CC00"
, "3" = "#6600CC"
)
) +
theme_classic() +
theme(
legend.title = element_text(size = 12, face = "bold")
, legend.position = "right"
, legend.background = element_rect(color = "black", size = .5)
)
ggsave("images/pca_plot.png", height = 4.47, width = 7.2)
pca_vizPCA 3D Visualisation
Given the overlap, we can increase the dimensions used to visualise the clusters to better identify the differences between clusters. Once this 3D plot is generated, we’ll then save it as a html widget.
#Join colour data
colors <- tibble(cluster = 1:3, colors = c("#FF8000", "#00CC00", "#6600CC"))
ind.coord <- ind.coord %>%
mutate(cluster = as.numeric(cluster)) %>%
left_join(colors)
#Create 3D plot
td_pca <- plot3d(
x=ind.coord$Dim.1, y=ind.coord$Dim.2, z=ind.coord$Dim.3,
col = ind.coord$colors,
type = 'p',
radius = .3,
xlab=paste0("Comp 1 (", variance.percent[1], "% )" ),
ylab=paste0("Comp 2 (", variance.percent[2], "% )" ),
zlab=paste0("Comp 3 (", variance.percent[3], "% )" )
)
rglwidget()#Svae as html widget
htmlwidgets::saveWidget(
widget = rglwidget(width = 520, height = 520),
file = "html/3d_cluster.html",
selfcontained = TRUE
)Cluster Exploration
The following code chunks will analyse the clusters we’ve fitted to explore the data. First though, we’ll extract the data for the density plot of the lr_scale, by cluster.
#Add cluster identifiers to voter data
data <- data %>%
mutate(cluster = kmean$cluster)
#Create density plot
lr_plot <- ggplot(data, aes(x = lr_scale, fill = as.character(cluster))) +
geom_density(alpha = .4) +
labs(
x = "Left/Right Scale (0 = Left, 10 = Right)"
, y = "Proportion"
, title = "Cluster Political Alignment"
, subtitle = "Left-right alignment of voters by cluster"
, fill = "Cluster"
) +
scale_fill_manual(values = c(
"1" = "#FF8000"
, "2" = "#00CC00"
, "3" = "#6600CC"
)
) +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(
expand = c(0,0)
, labels = scales::percent_format(accuracy = 1)
) +
theme_classic() +
theme(
axis.line.x.bottom = element_line()
, panel.grid.major.y = element_line()
, panel.grid.minor.y = element_blank()
, panel.grid.minor.x = element_blank()
, legend.title = element_text(size = 12, face = "bold")
, legend.position = "right"
, legend.background = element_rect(color = "black", size = .5)
, axis.text.y = element_text(size = 8)
)
ggsave("images/lr_plot.png", height = 4.47, width = 7.2)
lr_plotCluster 1: ‘The Liberal Cosmopolitans’
First we’ll get the proportions from the data for this cluster. Given that it appears to be the most left-leaning, we’ll take the proportions for labour vote, earnings educational qualifications, guardian reading, and remain voting.
#Extract varaibles of interest
cluster_1 <- kmean$centers %>%
as_tibble() %>%
mutate(
cluster = as.character(1:3)
, uni_educated = p_education_university_or_cnaa_first_degree_e_g_ba_b_sc_b_ed + p_education_university_or_cnaa_higher_degree_e_g_m_sc_ph_d
) %>%
select(
cluster
, general_election_vote_labour
, eu_ref_vote_stay_remain_in_the_eu
, risk_poverty_very_unlikely
, risk_unemployment_very_unlikely
, uni_educated
, p_paper_read_the_guardian
) %>%
reshape2::melt(id = "cluster")
#Create cluster plot
cluster_1_plot <- ggplot(cluster_1, aes(y = variable, x = value, group = cluster, fill = cluster)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
x = "Proportion"
, y = NULL
, title = "Cluster 1 Categories of Interest"
, subtitle = "Proportion of voters falling into each category, by cluster"
, fill = "Cluster"
) +
scale_x_continuous(
labels = scales::percent_format(accuracy = 1)
, limits = c(0, 0.54)
, expand = c(0,0)
) +
scale_y_discrete(labels = c(
"Will vote Labour"
, "Voted Remain"
, "Low risk of poverty"
, "Low risk unemployment"
, "University educated"
, "Guardian readers"
)
) +
scale_fill_manual(values = c(
"1" = "#FF8000"
, "2" = "#00CC00"
, "3" = "#6600CC"
)
) +
theme_classic() +
theme(
axis.line.x.bottom = element_blank()
, panel.grid.major.x = element_line()
, panel.grid.minor.x = element_blank()
, panel.grid.minor.y = element_blank()
, legend.title = element_text(size = 12, face = "bold")
, legend.position = "right"
, legend.background = element_rect(color = "black", size = .5)
, axis.text.y = element_text(size = 8)
)
ggsave("images/c1_plot.png", height = 4.47, width = 7.2)
cluster_1_plotCluster 2: ‘The Tory Loyalists’
For the analysis of this cluster we’ll first extract the household and personal income groupings to investiagte our hunch that this cluster is the most economically secure
#Extract income data by cluster
cluster_income <- data %>%
select(
cluster
, p_gross_personal
, p_gross_household
) %>%
filter(
#p_gross_personal < 15
, p_gross_household < 16
) %>%
group_by(cluster) %>%
summarise(
#prop_p = sum(p_gross_personal > 10)/n()
, prop_h = sum(p_gross_household > 10)/n() * 100
) %>%
mutate(cluster = as.factor(cluster)) %>%
rename(
Cluster = cluster
, `Proportion of cluster in households earning more than £50,000 (%)` = prop_h
)
#Present data in table
knitr::kable(
cluster_income
)| Cluster | Proportion of cluster in households earning more than £50,000 (%) |
|---|---|
| 1 | 26.06519 |
| 2 | 26.65531 |
| 3 | 21.77033 |
With this extracted, we can then analyse the other features that make cluster 2 stand out. For this we’ll be extracting their prior voting history and religiosity.
#Extract variables of interest
cluster_2 <- kmean$centers %>%
as_tibble() %>%
mutate(
cluster = as.character(1:3)
) %>%
group_by(cluster) %>%
mutate(
christian = sum(
p_religion_yes_church_of_england_anglican_episcopal
, p_religion_yes_roman_catholic
, p_religion_yes_presbyterian_church_of_scotland
, p_religion_yes_methodist
, p_religion_yes_baptist
, p_religion_yes_united_reformed_church
, p_religion_yes_free_presbyterian
, p_religion_yes_brethren
, p_religion_yes_a_orthodox_christian
, p_religion_yes_pentecostal_e_g_assemblies_of_god_elim_pentecostal_church_new_testament_church_of_god_redeemed_christian_church_of_god
, p_religion_yes_evangelical_a_independent_non_denominational_e_g_fiec_pioneer_vineyard_newfrontiers
)
) %>%
select(
cluster
, general_election_vote_conservative
, p_past_vote_2019_conservative
, p_past_vote_2017_conservative
, p_past_vote_2015_conservative
, eu_ref_vote_leave_the_eu
, christian
) %>%
pivot_longer(
cols = c(general_election_vote_conservative
, p_past_vote_2019_conservative
, p_past_vote_2017_conservative
, p_past_vote_2015_conservative
, eu_ref_vote_leave_the_eu
, christian)
, names_to = "names"
, values_to = "values"
)
#Create cluster bar plot
cluster_2_plot <- ggplot(cluster_2, aes(y = names, x = values, group = cluster, fill = cluster)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
x = "Proportion"
, y = NULL
, title = "Cluster 2 Categories of Interest"
, subtitle = "Proportion of voters falling into each category by cluster"
, fill = "Cluster"
) +
scale_x_continuous(
labels = scales::percent_format(accuracy = 1)
, limits = c(0, 0.54)
, expand = c(0,0)
) +
scale_y_discrete(labels = c(
"Christian"
, "Voted Leave"
, "Voted Conservative in 2015"
, "Voted Conservative in 2017"
, "Voted Conservative in 2019"
, "Will vote Conservative"
)
) +
scale_fill_manual(values = c(
"1" = "#FF8000"
, "2" = "#00CC00"
, "3" = "#6600CC"
)
) +
theme_classic() +
theme(
axis.line.x.bottom = element_blank()
, panel.grid.major.x = element_line()
, panel.grid.minor.x = element_blank()
, panel.grid.minor.y = element_blank()
, legend.title = element_text(size = 12, face = "bold")
, legend.position = "right"
, legend.background = element_rect(color = "black", size = .5)
, axis.text.y = element_text(size = 8)
)
ggsave("images/c2_plot.png", height = 4.47, width = 7.2)
cluster_2_plotCluster 3: ‘The Politically Uninterested’
This is the more politically apathetic cluster of the three. For the analysis of this cluster, we will extract proportions for non-White British ethnicity, divorce, and ‘don’t knows’ in deciding who to vote for.
#Extract variables of interest
cluster_3 <- kmean$centers %>%
as_tibble() %>%
mutate(
cluster = as.character(1:3)
) %>%
group_by(cluster) %>%
mutate(
non_white_british = 1 - sum(p_ethnicity_white_british, p_ethnicity_any_other_white_background)
) %>%
select(
cluster
, non_white_british
, p_marital_divorced
, p_education_no_formal_qualifications
, p_paper_read_the_sun
, p_paper_read_none
, general_election_vote_dont_know
, eu_ref_vote_dont_know
) %>%
pivot_longer(
cols = c(p_marital_divorced
, p_education_no_formal_qualifications
, p_paper_read_the_sun
, p_paper_read_none
, non_white_british
, general_election_vote_dont_know
, eu_ref_vote_dont_know)
, names_to = "names"
, values_to = "values"
)
#Create plot for cluster 3
cluster_3_plot <- ggplot(cluster_3, aes(y = names, x = values, group = cluster, fill = cluster)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
x = "Proportion"
, y = NULL
, title = "Cluster 3 Categories of Interest"
, subtitle = "Proportion of voters falling into each category by cluster"
, fill = "Cluster"
) +
scale_x_continuous(
labels = scales::percent_format(accuracy = 1)
#, limits = c(0, 0.54)
, expand = c(0,0)
) +
scale_y_discrete(labels = c(
"Don't know how they voted in the EU referendum"
, "Don't know who they'd vote for in an election"
, "Not White British"
, "No formal qualifications"
, "Divorced"
, "Don't read any newspapers"
, "Sun readers"
)
) +
scale_fill_manual(values = c(
"1" = "#FF8000"
, "2" = "#00CC00"
, "3" = "#6600CC"
)
) +
theme_classic() +
theme(
axis.line.x.bottom = element_blank()
, panel.grid.major.x = element_line()
, panel.grid.minor.x = element_blank()
, panel.grid.minor.y = element_blank()
, legend.title = element_text(size = 12, face = "bold")
, legend.position = "right"
, legend.background = element_rect(color = "black", size = .5)
, axis.text.y = element_text(size = 8)
)
ggsave("images/c3_plot.png", height = 4.47, width = 7.2)
cluster_3_plot